[Modular] More Updates for Custom Code Loading #11969

DN6 · 2025-07-21T17:01:05Z

What does this PR do?

In order to support custom code for any block type, we have to invoke downloading/loading of custom code from the ModularPipelineBlocks object. This means some of the properties/methods defined in PipelineBlock, SequentialPipelineBlock etc have to be moved into the ModularPipelineBlocks class.

Additionally, included a potential change to consolidate how the block state is created from inputs and intermediate inputs. The change is necessary to support the case where a pipeline input might be created in an intermediate step e.g. Using a segmentation model to create an inpainting mask. With the existing approach, simply setting the output of the mask creation step to the desired input value mask_image wouldn't allow downstream steps to access it in the block state, because get_block_state runs over required inputs first, and errors out because it's missing. Propose changing this to checking for required values in both input and intermediates before erroring out. There could be edge cases here that I might be missing, but it seems safe to consolidate in this way.

Code to test

import torch
from diffusers.modular_pipelines import ModularPipelineBlocks, SequentialPipelineBlocks
from diffusers.modular_pipelines.stable_diffusion_xl import INPAINT_BLOCKS
from diffusers.utils import load_image

# fetch the Florence2 image annotator block that will create our mask
image_annotator_block = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/florence2-image-annotator",
    trust_remote_code=True,
)

my_blocks = INPAINT_BLOCKS.copy()
# insert the annotation block before the image encoding step
my_blocks.insert("image_annotator", image_annotator_block, 1)

# Create our initial set of inpainting blocks
blocks = SequentialPipelineBlocks.from_blocks_dict(my_blocks)

repo_id = "YiYiXu/modular-loader-t2i-0704"
pipe = blocks.init_pipeline(repo_id)
pipe.load_default_components(torch_dtype=torch.float16, device_map="cuda", trust_remote_code=True)

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true")
image = image.resize((1024, 1024))

batch_size = 1
image = [image] * batch_size

prompt = ["A red car"] * batch_size
annotation_prompt = ["<REFERRING_EXPRESSION_SEGMENTATION>the car"] * batch_size

output = pipe(
    prompt=prompt,
    image=image,
    annotation_task_prompt=annotation_prompt,
    num_inference_steps=35,
    guidance_scale=7.5,
    strength=0.9,
    output_type="pil",
)
output.intermediates["mask_image"][0].save("annotated_mask_image.png")
output.intermediates["images"][0].save("annotated.png")

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-07-21T17:08:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2025-07-21T18:18:17Z

Additionally, included a potential change to consolidate how the block state is created from inputs and intermediate inputs. The change is necessary to support the case where a pipeline input might be created in an intermediate step

I see, do you think maybe we should drop the concept difference between "inputs" and "intermediate inputs" algether? all inputs could be just "intermediates", and they can all be modified. https://huggingface.co/docs/diffusers/main/en/modular_diffusers/modular_diffusers_states

Taking the example of mask_image, it seems like we really just need to declare it as an intermediate_inputs in all downstream blocks: if it is an intermediate inputs, the donwsream blocks will first look to see if a previous block provides it, only if not in intermediate state, it will look into the input states to use the value provided by users - exactly what we need here.

If we do this for mask_image, we probably need to do it for image too, and same goes for all other user inputs.

We need to be a little bit more mindful in modifying variables, and recommend to always use a new variable name, or not add to block_state, unless it's intended to replace

e.g. if a downstream block needs to use a block like this would mean that it would not be able to access the raw image before the process

def __call__(...):
    ...
    block_state.image = process(block_state.image)
    block_state.image_latent = prepare_latents(block_state.image)
   ...

but, this might be fine (depends on if image is a mutable and if process function modifies it in place)

def __call__(...):
    ...
    image = process(block_state.image)
    block_state.image_latent = prepare_latents(block_state.image)
   ...

On the other hand, the system would be more flexible, and also it simplifies a bit conceptually (I found it not easy to explain the difference between these two states)

Let me know what you think!

DN6 · 2025-07-22T06:51:09Z

I see, do you think maybe we should drop the concept difference between "inputs" and "intermediate inputs" algether? all inputs could be just "intermediates", and they can all be modified. https://huggingface.co/docs/diffusers/main/en/modular_diffusers/modular_diffusers_states

IMO this makes sense since we more or less put them into the same group when fetching the block state and they are accessed in the block in the same way. I can't think of any edge cases where this might lead to issues.

Taking the example of mask_image, it seems like we really just need to declare it as an intermediate_inputs in all downstream blocks: if it is an intermediate inputs, the donwsream blocks will first look to see if a previous block provides it, only if not in intermediate state, it will look into the input states to use the value provided by users - exactly what we need here.

We could do this, but suppose you want to insert a custom block that manipulates an input value into a set of existing blocks. You would have to update all subsequent blocks to point to the intermediate input.

def __call__(...):
  ...
  image = process(block_state.image)
  block_state.image_latent = prepare_latents(block_state.image)
 ...

I think this works well as best practice if we want to leave the input unchanged. IMO the less restrictive we are the better since there isn't a very strong reason to keep the input types separated?

DN6 · 2025-07-22T14:02:37Z

src/diffusers/modular_pipelines/modular_pipeline.py

@@ -322,7 +322,7 @@ class ModularPipelineBlocks(ConfigMixin, PushToHubMixin):
    </Tip>
    """

-    config_name = "config.json"
+    config_name = "modular_config.json"


For consistency with modular_model_index.json. Also could be cases where a repo contains model weights/config file and a modular pipeline block to load the model. We can avoid conflicts with the configs this way.

yiyixuxu · 2025-07-22T16:00:45Z

IMO this makes sense since we more or less put them into the same group when fetching the block state and they are accessed in the block in the same way. I can't think of any edge cases where this might lead to issues.

Sounds good! let's do this then! I think it will simplify the code a lot. Basically now we'll just have inputs and outputs
I'm happy to take on this one since I'm more familiar with code, but let me know if you want to work on this :)

yiyixuxu · 2025-07-22T21:49:47Z

Another thing is: should we remove the concept of single PipelineBlock? it seems like you moved all the methods unique to PipelineBlock to the base ModularPipelineBlocks and we just don't need it anymore :)
(I haven't tested it out though, but I think it would be great if it's the case)

DN6 · 2025-07-23T11:25:40Z

@yiyixuxu
I can make the following updates to this PR

Consolidate intermediate inputs/inputs and outputs
Remove PipelineBlock

Then you can review and refactor as you see fit?

yiyixuxu · 2025-07-23T16:17:54Z

@DN6 sounds good!

…diffusers into custom-code-updates

DN6 · 2025-07-30T06:27:50Z

src/diffusers/modular_pipelines/modular_pipeline.py

+    values: Dict[str, Any] = field(default_factory=dict)
+    kwargs_mapping: Dict[str, List[str]] = field(default_factory=dict)


Since we've removed the distinction between inputs and intermediates, we can perhaps simplify PipelineState as well.

All values are stored under value and all kwargs_type under kwargs_mapping. Setting and getting are through set/get methods which handle both single string/list inputs for setting/fetching.

DN6 · 2025-07-30T06:29:27Z

src/diffusers/modular_pipelines/modular_pipeline.py

+    @property
+    def intermediate_outputs(self) -> List[OutputParam]:
+        """List of intermediate output parameters. Must be implemented by subclasses."""
+        return []
+
+    def _get_outputs(self):
+        return self.intermediate_outputs
+
+    @property
+    def outputs(self) -> List[OutputParam]:
+        return self._get_outputs()


Think these can also be consolidated into just outputs? Didn't do it here to keep PR scope limited.

yes to both:
we should only have outputs
can do it in a next PR: currently outputs are not used actually, so just have to remove outputs from the code basee and then change the intermediate_outputs to outputs

DN6 · 2025-07-30T06:41:33Z

@yiyixuxu made the changes as discussed. LMK your thoughts.

yiyixuxu · 2025-08-04T07:37:06Z

so "image" is one of the inputs that we override

e.g. here we update it with processed image

diffusers/src/diffusers/modular_pipelines/stable_diffusion_xl/encoders.py

Line 671 in 8e53cd9

block_state.image = components.image_processor.preprocess(

but may need the raw input later. e.g.

diffusers/src/diffusers/modular_pipelines/stable_diffusion_xl/decoders.py

Line 211 in 8e53cd9

block_state.mask_image, block_state.image, i, block_state.crops_coords

should we rename it to something else?

yiyixuxu · 2025-08-04T07:46:52Z

btw this is a slow test I ran for sdxl, currently 12, 14, 15, 16 fails, should be easy to fix (missed a few intermediate_inputs) etc

can you fix them and make sure these tests are able to run functionally? don't worry about generation, I can double check before mergge

Click to show test script

# test modular pipeline (slower test)


import os
import shutil

import torch

from diffusers import (
    ControlNetModel,
    UNet2DConditionModel,
    AutoencoderKL,
    ControlNetUnionModel,
    AdaptiveProjectedGuidance,
    ClassifierFreeGuidance,
    PerturbedAttentionGuidance,
    LayerSkipConfig,
    ModularPipeline,
)
from diffusers import StableDiffusionXLAutoBlocks, ComponentsManager, ComponentSpec
from diffusers.modular_pipelines.stable_diffusion_xl import StableDiffusionXLAutoIPAdapterStep

from transformers import CLIPVisionModelWithProjection, CLIPImageProcessor

import logging
logging.getLogger().setLevel(logging.INFO)
logging.getLogger("diffusers").setLevel(logging.INFO)


# define device and dtype
device = "cuda:3"
dtype = torch.float16
num_images_per_prompt = 1

# test related parameters
test_lora = False
tests_to_run = [1,2,3,4,5,6,7,8,9,10,11,12,13,14, 15,16]


# define output folder
out_folder = "modular_test_outputs"
os.makedirs(out_folder, exist_ok=True)

# functions for memory info
def reset_memory():
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

def clear_memory():
    torch.cuda.empty_cache()

def print_memory(message=None):
    """
    Print detailed GPU memory statistics for a specific device.
    
    Args:
        device_id (int): GPU device ID
    """

    def print_mem(mem_size, name):
        mem_gb = mem_size / 1024**3
        mem_mb = mem_size / 1024**2
        print(f"- {name}: {mem_gb:.2f} GB ({mem_mb:.2f} MB)")

    allocated_mem = torch.cuda.memory_allocated(device)
    reserved_mem = torch.cuda.memory_reserved(device)
    mem_on_device = torch.cuda.mem_get_info(device)[0]
    peak_mem = torch.cuda.max_memory_allocated(device)

    print(f"\nGPU:{device} Memory Status {message}:")
    print_mem(allocated_mem, "allocated memory")
    print_mem(reserved_mem, "reserved memory")
    print_mem(peak_mem, "peak memory")
    print_mem(mem_on_device, "mem on device")


    
# (1)Define inputs
# prompts
prompt = "a bear sitting in a chair drinking a milkshake"
negative_prompt = "deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality"
# image urls
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
inpaint_img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
inpaint_mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
ip_adapter_image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png"
# strength/scale etc
strength = 0.9 #img2img strength
inpaint_strength = 0.99 #inpainting strength
controlnet_conditioning_scale = 0.5  # recommended for good generalization)

# get all the image inputs(use a custom block to get images to prepare them)
get_image_step = ModularPipeline.from_pretrained("YiYiXu/image_inputs_0803", trust_remote_code=True)

init_image = get_image_step(image_url=url,output="image")
control_image = get_image_step(image_url=url, processor_id="canny",output="image")
controlnet_union_image = get_image_step(image_url=url, processor_id="lineart_anime",output="image")
inpaint_image = get_image_step(image_url=inpaint_img_url, size=(1024, 1024),output="image")
inpaint_mask = get_image_step(image_url=inpaint_mask_url, size=(1024, 1024),output="image")
ip_adapter_image = get_image_step(image_url=ip_adapter_image_url,output="image")



# (2) create pipelines  
auto_blocks = StableDiffusionXLAutoBlocks()
refiner_blocks = StableDiffusionXLAutoBlocks()


# (3) define model components needed for the tests

# specs
refiner_spec = ComponentSpec(name="refiner", type_hint=UNet2DConditionModel, repo="stabilityai/stable-diffusion-xl-refiner-1.0", subfolder="unet")
inpaint_spec = ComponentSpec(name="inpaint", type_hint=UNet2DConditionModel, repo="diffusers/stable-diffusion-xl-1.0-inpainting-0.1", subfolder="unet")
controlnet_union_spec = ComponentSpec(name="controlnet_union", type_hint=ControlNetUnionModel, repo="brad-twinkl/controlnet-union-sdxl-1.0-promax")
# repos
ip_adapter_repo = "h94/IP-Adapter"
modular_repo = "YiYiXu/modular_demo"
# create guiders: pag/cfg/apg
pag_guider_spec_config = {
    "guidance_scale": 5.0,
    "perturbed_guidance_scale": 3.0,
    "perturbed_guidance_config": LayerSkipConfig(
        indices=[2, 3, 7, 8],
        fqn="mid_block.attentions.0.transformer_blocks",
        skip_attention=False,
        skip_ff=False,
        skip_attention_scores=True,
    ),
    "start": 0.0,
    "stop": 1.0,
}
pag_guider_spec = ComponentSpec(name="guider", type_hint=PerturbedAttentionGuidance, config=pag_guider_spec_config, default_creation_method="from_config")
cfg_guider_spec = ComponentSpec(name="guider", type_hint=ClassifierFreeGuidance, config={"guidance_scale": 5.0}, default_creation_method="from_config")
apg_guider_spec = ComponentSpec(name="guider", type_hint=AdaptiveProjectedGuidance, config={"guidance_scale": 15.0, "adaptive_projected_guidance_momentum": -0.3, "adaptive_projected_guidance_rescale": 12.0, "start": 0.01}, default_creation_method="from_config")

# code to push their to hub
# pag_guider_spec.create().push_to_hub(modular_repo, subfolder="pag_guider")
# cfg_guider_spec.create().push_to_hub(modular_repo, subfolder="cfg_guider")
# apg_guider_spec.create().push_to_hub(modular_repo, subfolder="apg_guider")

# (4) create components manager and load the pipeline
components = ComponentsManager()
auto_pipeline = auto_blocks.init_pipeline(modular_repo, components_manager=components, collection="sdxl_auto")
#auto_pipeline.save_pretrained(modular_repo, push_to_hub=True)
auto_pipeline.load_default_components(torch_dtype=dtype)


print(f" ")
print(f"auto_pipeline:")
print(auto_pipeline)
print(f" loader components:")
for key, value in auto_pipeline.components.items():
    if isinstance(value, torch.nn.Module):
        print(f" {key}: {value.__class__.__name__}, dtype: {value.dtype}, device: {value.device}")


# enable auto cpu offload: automatically offload models when available gpu memory go below a certain threshold
components.enable_auto_cpu_offload(device=device)
print(components)
reset_memory()



# using auto_pipeline to generate images

# to get info about auto_pipeline and how to use it: inputs/outputs/components
# this is an "auto" workflow that works for all use cases: text2img, img2img, inpainting, controlnet, etc.
print(f" ")
print(f" auto_pipeline:")
print(auto_pipeline)
print(" ")


print(f" ")
print(f" auto_pipeline.blocks:")
print(auto_pipeline.blocks)
print(" ")

# since we want to use text2img use case, we can run the following to see components/blocks/inputs for this use case
print(f" ")
print(f" auto_pipeline info (default use case: text2img)")
print(auto_pipeline.blocks.get_execution_blocks())
print(" ")


# test1: text2img use case
# when you run the auto workflow, you will get these logs telling you which blocks are actuallyrunning
# (should match what the sdxl_node told you)
# Running block: StableDiffusionXLBeforeDenoiseStep, trigger: None
# Running block: StableDiffusionXLDenoiseStep, trigger: None
# Running block: StableDiffusionXLDecodeStep, trigger: None

# assert False

if 1 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        num_images_per_prompt=num_images_per_prompt,
        generator=generator, 
        output="images"
    )
    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test1_out_text2img_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test1_out_text2img.png")

clear_memory()



# test2: text2img with lora use case
print(f" ")
print(f" running test2: text2img with lora use case")
auto_pipeline.load_lora_weights("rajkumaralma/dissolve_dust_style", weight_name="ral-dissolve-sdxl.safetensors", adapter_name="ral-dissolve")
if 2 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        num_images_per_prompt=num_images_per_prompt,
        generator=generator, 
        output="images"
    )
    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test2_out_text2img_lora_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test2_out_text2img_lora.png")

# test3:text2image with pag
print(f" ")
print(f" running test3:text2image with pag")
if not test_lora:
    auto_pipeline.unload_lora_weights()
auto_pipeline.update_components(guider=pag_guider_spec)

if 3 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        num_images_per_prompt=num_images_per_prompt,
        generator=generator,
        output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test3_out_text2img_pag_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test3_out_text2img_pag.png")

clear_memory()
# checkout the components if you want, the models used is moved to devicem some might get offloaded to cpu
# print(components)


# test4: SDXL(text2img) with ip_adapter+ pag?
print(f" ")
print(f" running test4: SDXL(text2img) with ip_adapter")

auto_pipeline.load_ip_adapter(ip_adapter_repo, subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
auto_pipeline.set_ip_adapter_scale(0.6)

if 4 in tests_to_run: 
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        num_images_per_prompt=num_images_per_prompt,
        generator=generator, 
        ip_adapter_image=ip_adapter_image,
        output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test4_out_text2img_ip_adapter_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test4_out_text2img_ip_adapter.png")

auto_pipeline.unload_ip_adapter()
clear_memory()

# test5: SDXL(text2img) with controlnet

# we are going to pass a new input now `control_image` so the workflow will be automatically converted to controlnet use case
# let's checkout the info for controlnet use case
print(f" auto_pipeline info (controlnet use case)")
print(auto_pipeline.blocks.get_execution_blocks("control_image"))
print(" ")

print(f" ")
print(f" running test5: SDXL(text2img) with controlnet")

if 5 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        control_image=control_image, 
        controlnet_conditioning_scale=controlnet_conditioning_scale,
        num_images_per_prompt=num_images_per_prompt,
        generator=generator,
        output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test5_out_text2img_control_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test5_out_text2img_control.png")

clear_memory()


# test6: SDXL(img2img)

print(f" ")
print(f" running test6: SDXL(img2img)")

# let's checkout the sdxl_node info for img2img use case
print(f" auto_pipeline info (img2img use case)")
print(auto_pipeline.blocks.get_execution_blocks("image"))
print(" ")

if 6 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        image=init_image, 
        strength=strength, 
        num_images_per_prompt=num_images_per_prompt,
        generator=generator, 
        output="images"
    )
    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test6_out_img2img_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test6_out_img2img.png")

clear_memory()


# test7: SDXL(img2img) with controlnet
# let's checkout the sdxl_node info for img2img controlnet use case
print(f" sdxl_node info (img2img controlnet use case)")
print(auto_pipeline.blocks.get_execution_blocks("image", "control_image"))
print(" ")

print(f" ")
print(f" running test7: SDXL(img2img) with controlnet")
if 7 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        image=init_image, 
        strength=strength, 
        num_images_per_prompt=num_images_per_prompt,
        control_image=control_image, 
        controlnet_conditioning_scale=controlnet_conditioning_scale, 
        generator=generator, 
        output="images"
    )

    for i, image in enumerate(images_output):
        print(f"image: {image.size}")
        image.save(f"{out_folder}/test7_out_img2img_control_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test7_out_img2img_control.png")

clear_memory()

# test8: img2img with refiner

# test refiner pipeline but not using a repo
refiner_pipeline = refiner_blocks.init_pipeline(components_manager=components, collection="refiner")

print(f" ")
print(f" after setup refiner loader (initial setup, should be empty)")
print(refiner_pipeline)
print(f" ")


refiner_components = components.search_components("!unet|text_encoder|tokenizer|guider", collection="sdxl_auto")
print(f" reuse these components for refiner pipeline:")
for name, component in refiner_components.items():
    print(f" {name}: {component.__class__.__name__}")
print(f" ")


refiner_pipeline.update_components(**refiner_components, unet=refiner_spec.load(torch_dtype=dtype), force_zeros_for_empty_prompt=False, requires_aesthetics_score=True)
print(f" ")
print(f" refiner loader after update")
print(refiner_pipeline)
print(f" ")

print(f" ")
print(f" ")
print(f" components info")
print(components)
print(f" ")


print(f" running test8: img2img with refiner (reuse components from components manager)")

if 8 in tests_to_run:
    print(f" ")
    print(f" step1 run auto pipeline to get latents")
    generator = torch.Generator(device="cuda").manual_seed(0)
    latents = auto_pipeline(
        prompt=prompt, 
        num_images_per_prompt=num_images_per_prompt,
        generator=generator, 
        denoising_end=0.8,
        output="images",
        output_type="latent",
    )
    print(f" ")
    print(f" step2 run refiner pipeline to get images")
    images_output = refiner_pipeline(
        image_latents=latents,  
        prompt=prompt, 
        denoising_start=0.8, 
        generator=generator, 
        num_images_per_prompt=num_images_per_prompt,
        output="images"
    )
    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test8_out_img2img_refiner_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test8_out_img2img_refiner.png")

clear_memory()

# test9: SDXL(inpainting)
# let's checkout the sdxl_node info for inpainting use case
print(f" auto_pipeline info (inpainting use case)")
print(auto_pipeline.blocks.get_execution_blocks("mask_image", "image"))
print(" ")

print(f" ") 
print(f" running test9: SDXL(inpainting)")

if 9 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        image=inpaint_image, 
        mask_image=inpaint_mask, 
        height=1024, 
        width=1024, 
        generator=generator, 
        num_images_per_prompt=num_images_per_prompt,
        strength=inpaint_strength,  # make sure to use `strength` below 1.0
        output="images"
    )
    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test9_out_inpainting_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test9_out_inpainting.png")

clear_memory()

# test10: SDXL(inpainting) with controlnet
# let's checkout the sdxl_node info for inpainting + controlnet use case
print(f" auto_pipeline info (inpainting + controlnet use case)")
print(auto_pipeline.blocks.get_execution_blocks("mask_image", "control_image"))
print(" ")

print(f" ") 
print(f" running test10: SDXL(inpainting) with controlnet")

if 10 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        control_image=control_image, 
        image=inpaint_image,
        height=1024,
        width=1024,
        mask_image=inpaint_mask,
        num_images_per_prompt=num_images_per_prompt,
        controlnet_conditioning_scale=controlnet_conditioning_scale, 
        strength=inpaint_strength,  # make sure to use `strength` below 1.0
        generator=generator,
        output="images"
    )
    for i, image in enumerate(images_output):
      image.save(f"{out_folder}/test10_out_inpainting_control_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test10_out_inpainting_control.png")

clear_memory()

# test11: SDXL(inpainting) with inpaint_unet
print(f" ") 
print(f" running test11: SDXL(inpainting) with inpaint_unet")

inpaint_unet = inpaint_spec.load(torch_dtype=dtype)
# make a backup to swtich back later
sdxl_unet_spec = ComponentSpec.from_component("unet", auto_pipeline.unet)
auto_pipeline.update_components(unet=inpaint_unet)
if 11 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        image=inpaint_image, 
        mask_image=inpaint_mask, 
        height=1024, 
        width=1024, 
        generator=generator, 
        num_images_per_prompt=num_images_per_prompt,
        output="images"
    )
    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test11_out_inpainting_inpaint_unet_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test11_out_inpainting_inpaint_unet.png")

clear_memory()
print(f" after update with inpaint_unet")
print(components)


# test12: SDXL(inpainting) with inpaint_unet + padding_mask_crop
print(f" ") 
print(f" running test12: SDXL(inpainting) with inpaint_unet (padding_mask_crop=33)")

if 12 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        image=inpaint_image, 
        mask_image=inpaint_mask, 
        height=1024, 
        width=1024, 
        generator=generator, 
        padding_mask_crop=33, 
        num_images_per_prompt=num_images_per_prompt,
        strength=inpaint_strength,  # make sure to use `strength` below 1.0
        output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test12_out_inpainting_inpaint_unet_padding_mask_crop_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test12_out_inpainting_inpaint_unet_padding_mask_crop.png")

clear_memory()


# test13: apg

print(f" ")
print(f" running test13: apg")

auto_pipeline.update_components(guider=apg_guider_spec, unet=sdxl_unet_spec.load(torch_dtype=dtype))
print(f" autopipeline loader after update with apg guider and unet")
print(auto_pipeline)
print(f" ")

print(f" ")
print(f" components info")
print(components)
print(f" ")

if 13 in tests_to_run:
    generator = torch.Generator().manual_seed(0)
    images_output = auto_pipeline(
      prompt=prompt, 
      generator=generator,
      num_inference_steps=20,
      num_images_per_prompt=1, # yiyi: apg does not work with num_images_per_prompt > 1
      height=896,
      width=768,
      output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test13_out_apg_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test13_out_apg.png")

clear_memory()


# test13: SDXL(text2img) with controlnet_union

auto_pipeline.update_components(
    controlnet=controlnet_union_spec.load(torch_dtype=dtype), 
    guider=pag_guider_spec
)

print(f" autopipeline loader after update with controlnet (controlnet_union), unet (sdxl_auto), and guider (pag_guider)")
print(auto_pipeline)
print(f" ")

print(f" ")
print(f" components info")
print(components)
print(f" ")

# we are going to pass a new input now `control_mode` so the workflow will be automatically converted to controlnet use case
# let's checkout the info for controlnet use case
print(f" auto_pipeline info (controlnet union use case)")
print(auto_pipeline.blocks.get_execution_blocks("control_mode"))
print(" ")
print(f" ")
print(f" running test14: SDXL(text2img) with controlnet_union")

if 14 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)

    images_output = auto_pipeline(
        prompt=prompt, 
        control_mode=[3],
        control_image=[controlnet_union_image], 
        num_images_per_prompt=num_images_per_prompt,
        height=1024,
        width=1024,
        generator=generator,
        output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test14_out_text2img_control_union_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test14_out_text2img_control_union.png")

clear_memory()


# test15: SDXL(img2img) with controlnet_union

print(f" ")
print(f" auto_pipeline info (img2img controlnet union use case)")
print(auto_pipeline.blocks.get_execution_blocks("image", "control_mode"))
print(" ")

print(f" ")
print(f" running test15: SDXL(img2img) with controlnet_union")

if 15 in tests_to_run:
    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        image=init_image, 
        generator=generator, 
        control_mode=[3], 
        control_image=[controlnet_union_image], 
        num_images_per_prompt=num_images_per_prompt, 
        height=1024, 
        width=1024, 
        output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test15_out_img2img_control_union_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test15_out_img2img_control_union.png")

clear_memory()

# test15: SDXL(inpainting) with controlnet_union
print(f" ")
print(f" auto_pipeline info (inpainting controlnet union use case)")
print(auto_pipeline.blocks.get_execution_blocks("mask", "control_mode"))
print(" ")

print(f" ")
print(f" running test16: SDXL(inpainting) with controlnet_union")

if 16 in tests_to_run:

    generator = torch.Generator(device="cuda").manual_seed(0)
    images_output = auto_pipeline(
        prompt=prompt, 
        image=init_image, 
        mask_image=inpaint_mask, 
        control_image=controlnet_union_image,
        control_mode=[3],
        height=1024, 
        width=1024, 
        generator=generator, 
        output="images"
    )

    for i, image in enumerate(images_output):
        image.save(f"{out_folder}/test16_out_inpainting_control_union_{i}.png")
    print(f" save modular output ({len(images_output)} images) to {out_folder}/test16_out_inpainting_control_union.png")

clear_memory()

print_memory("the end")

print(f" components info after the end")
print(components)

src/diffusers/modular_pipelines/modular_pipeline.py

yiyixuxu

thanks so much @DN6
this is MUCH BETTER!!!

the change looks very nice to me, I left a few comments.

sayakpaul · 2025-08-08T09:40:01Z

@DN6 I am trying to make it work with a custom canny block with ControlNet:

Code

from diffusers.modular_pipelines.stable_diffusion_xl import IMAGE2IMAGE_BLOCKS
from diffusers.modular_pipelines.stable_diffusion_xl.modular_blocks import (
    StableDiffusionXLControlNetInputStep,
    StableDiffusionXLControlNetDenoiseStep
)
from diffusers.utils import load_image
from diffusers.modular_pipelines import SequentialPipelineBlocks, ModularPipelineBlocks
import torch


dtype = torch.float16
i2i_blocks = IMAGE2IMAGE_BLOCKS.copy()

control_input_block = StableDiffusionXLControlNetInputStep()
canny_filter = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/canny-filtering",
    trust_remote_code=True
)
# print(f"{control_input_block=}")
print(canny_filter)
i2i_blocks.insert("canny", canny_filter, 1)
i2i_blocks.insert("controlnet_input", control_input_block, 7)
i2i_blocks["denoise"] = StableDiffusionXLControlNetDenoiseStep()
print(f"{i2i_blocks=}")

i2i_pipe = SequentialPipelineBlocks.from_blocks_dict(i2i_blocks)
repo_id = "YiYiXu/modular-loader-t2i-0704"
pipe = i2i_pipe.init_pipeline(repo_id)
pipe.load_default_components(torch_dtype=dtype, device_map="cuda", trust_remote_code=True)
pipe.load_components(
    ["controlnet"], repo="diffusers/controlnet-canny-sdxl-1.0", torch_dtype=dtype
)
# print(pipe.components)

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true")
image = image.resize((1024, 1024))

output = pipe(
    prompt="red car floating in the blue sky, upside down",
    image=image,
    num_inference_steps=25,
    controlnet_conditioning_scale=0.5,
)
output.get_intermediate("images")[0].save("canny_modular.png")

I am getting:

Error

Traceback (most recent call last):
  File "/fsx/sayak/diffusers/push_canny_block.py", line 43, in <module>
    output = pipe(
  File "/fsx/sayak/diffusers/src/diffusers/modular_pipelines/modular_pipeline.py", line 2436, in __call__
    _, state = self.blocks(self, state)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/fsx/sayak/diffusers/src/diffusers/modular_pipelines/modular_pipeline.py", line 921, in __call__
    pipeline, state = block(pipeline, state)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/fsx/sayak/.cache/modules/diffusers_modules/local/diffusers-internal-dev--canny-filtering/17ad199d5e5ed3366665bdec880ce1fbacbf2e79/canny_block.py", line 77, in __call__
    block_state.control_image = self.compute_canny(
  File "/fsx/sayak/.cache/modules/diffusers_modules/local/diffusers-internal-dev--canny-filtering/17ad199d5e5ed3366665bdec880ce1fbacbf2e79/canny_block.py", line 64, in compute_canny
    canny_map = components.canny_annotator(
TypeError: 'NoneType' object is not callable

In my Canny block, I have:

...
    @property
    def expected_components(self):
        return [
            ComponentSpec(name="canny_annotator", type_hint=CannyDetector),
        ]
...

I am guessing it's happening because there's no from_pretrained() or from_config() involved here? How should cases like this be handled?

sayakpaul · 2025-08-08T10:51:00Z

Hmm I was able to make the code run (updated code below). But I am not getting the same outputs as a regular pipeline.

Updated

from diffusers.modular_pipelines.stable_diffusion_xl import IMAGE2IMAGE_BLOCKS
from diffusers.modular_pipelines.stable_diffusion_xl.modular_blocks import (
    StableDiffusionXLControlNetInputStep,
    StableDiffusionXLControlNetDenoiseStep
)
from diffusers.utils import load_image
from diffusers.modular_pipelines import SequentialPipelineBlocks, ModularPipelineBlocks
import torch


dtype = torch.float16
i2i_blocks = IMAGE2IMAGE_BLOCKS.copy()

control_input_block = StableDiffusionXLControlNetInputStep()
canny_filter = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/canny-filtering",
    trust_remote_code=True
)
# print(f"{control_input_block=}")
print(canny_filter)
i2i_blocks.insert("canny", canny_filter, 1)
i2i_blocks.insert("controlnet_input", control_input_block, 7)
i2i_blocks["denoise"] = StableDiffusionXLControlNetDenoiseStep()
print(f"{i2i_blocks=}")

i2i_pipe = SequentialPipelineBlocks.from_blocks_dict(i2i_blocks)
repo_id = "YiYiXu/modular-loader-t2i-0704"
pipe = i2i_pipe.init_pipeline(repo_id)
pipe.load_default_components(torch_dtype=dtype, trust_remote_code=True)
pipe.load_components(
    ["controlnet"], repo="diffusers/controlnet-canny-sdxl-1.0", torch_dtype=dtype
)
pipe.to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true")
image = image.resize((1024, 1024))

output = pipe(
    prompt="red car",
    negative_prompt="bad color, unrealistic, bad formation, bad colors",
    image=image,
    num_inference_steps=28,
    generator=torch.manual_seed(0),
    controlnet_conditioning_scale=0.5,
    strength=1.0, # It shouln't really be needed in ControlNet no? Otherwise, the default 0.3 strength is applied.
)
output.values["images"][0].save("canny_modular.png")

Notes:

For developing the code (for the most part), I followed https://huggingface.co/docs/diffusers/main/en/modular_diffusers/end_to_end_guide#working-with-controlnets.
I am having to do: output.values["images"][0].save("canny_modular.png") to fetch the output. I didn't have get_intermediates() available on the output variable. Is that expected?

Non-modular code:

Unfold

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from diffusers.utils import load_image
from PIL import Image
import torch
import numpy as np
from controlnet_aux import CannyDetector

prompt = "red car"
negative_prompt = "bad color, unrealistic, bad formation, bad colors"

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true")
image = image.resize((1024, 1024))

controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16,
).to("cuda")

canny_filter = CannyDetector()
image = canny_filter(image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)
images = pipe(
    prompt=prompt, 
    negative_prompt=negative_prompt, 
    image=image, 
    generator=torch.manual_seed(0),
    controlnet_conditioning_scale=0.5,
    num_inference_steps=28,
).images

images[0].save(f"hug_lab.png")

The only obvious difference is that we have 5.0 as the guidance_scale in the SDXL ControlNet pipeline:

diffusers/src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py

Line 1023 in 7b10e4a

guidance_scale: float = 5.0,

Whereas, we have 7.5 as guidance_scale in the guider component:

diffusers/src/diffusers/modular_pipelines/stable_diffusion_xl/before_denoise.py

Line 1111 in 7b10e4a

config=FrozenDict({"guidance_scale": 7.5}),

I tried passing guidance_scale=5.0 in the modular inference code above but it said guidance_scale isn't expected.


Regular	Modular

What am I missing?

sayakpaul · 2025-08-08T11:56:31Z

Prompt expander block works as expected but there's a problem in the Flux modular pipeline probably due to some changes in this PR:

Code

import os
os.environ["GOOGLE_API_KEY"] = "..."

import torch
from diffusers.modular_pipelines import SequentialPipelineBlocks, ModularPipelineBlocks
from diffusers.modular_pipelines.flux.modular_blocks import TEXT2IMAGE_BLOCKS


model_id = "black-forest-labs/FLUX.1-dev"
my_blocks = TEXT2IMAGE_BLOCKS.copy()
expander_block = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/gemini-prompt-expander",
    trust_remote_code=True,
)
my_blocks.insert("prompt_expander", expander_block, 0)
print(f"{my_blocks=}")
blocks = SequentialPipelineBlocks.from_blocks_dict(my_blocks)

pipeline = blocks.init_pipeline()
pipeline.load_components(["text_encoder"], repo=model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer"], repo=model_id, subfolder="tokenizer")
pipeline.load_components(["text_encoder_2"], repo=model_id, subfolder="text_encoder_2", torch_dtype=torch.bfloat16)
pipeline.load_components(["tokenizer_2"], repo=model_id, subfolder="tokenizer_2")
pipeline.load_components(["scheduler"], repo=model_id, subfolder="scheduler")
pipeline.load_components(["transformer"], repo=model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
pipeline.load_components(["vae"], repo=model_id, subfolder="vae", torch_dtype=torch.bfloat16)
pipeline.to("cuda")

prompt = "A cat and a dog."
output = pipeline(
    prompt=prompt, num_inference_steps=28, guidance_scale=3.5, generator=torch.manual_seed(0)
)
output.values["images"][0].save("modular_expander_flux.png")

Error

Traceback (most recent call last):
  File "/fsx/sayak/diffusers/upload_prompt_expanded.py", line 30, in <module>
    output = pipeline(
  File "/fsx/sayak/diffusers/src/diffusers/modular_pipelines/modular_pipeline.py", line 2431, in __call__
    _, state = self.blocks(self, state)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/fsx/sayak/diffusers/src/diffusers/modular_pipelines/modular_pipeline.py", line 917, in __call__
    pipeline, state = block(pipeline, state)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/fsx/sayak/diffusers/src/diffusers/modular_pipelines/flux/before_denoise.py", line 383, in __call__
    block_state.height = block_state.height or components.default_height
  File "/fsx/sayak/diffusers/src/diffusers/configuration_utils.py", line 144, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'ModularPipeline' object has no attribute 'default_height'

asomoza · 2025-08-08T15:08:32Z

What am I missing?

@sayakpaul you have to use a guider to change the guidance_scale see this.

sayakpaul · 2025-08-08T15:12:14Z

@asomoza thanks! Do you see anything else missing in ControlNet snippet?

asomoza · 2025-08-08T15:59:42Z

@sayakpaul I don't see anything else missing, however I haven't tested yet my code with this PR. Also I believe @yiyixuxu said that modular diffusers will not produce the exact same outputs as a normal pipeline, so maybe you can try with something more specific to compare them, like asking for a photorealistic or painting result?

sayakpaul · 2025-08-08T16:11:52Z

so maybe you can try with something more specific to compare them, like asking for a photorealistic or painting result?

Oh I used some specific comparisons in my testing for Flux T2I and I2I and they matched. Hence, I wondered a bit. I will test with the guider component like you suggested.

Also @DN6 an interesting observation regarding #11969 (comment). If I skip inserting i.e., my_blocks.insert("prompt_expander", expander_block, 0), the code runs fine. Is that expected?

sayakpaul · 2025-08-08T16:25:24Z

Okay I see what happened. I had to add model_name = "flux" to the beginning of the custom block: https://huggingface.co/diffusers-internal-dev/gemini-prompt-expander/blob/main/prompt_expander.py#L22

And now prompt expander works:

Expand

original prompt: A cat and a dog. expanded prompt: A majestic, fluffy ginger cat with piercing emerald green eyes is perched regally on a sun-drenched windowsill, its gaze fixed on a playful Golden Retriever puppy that is joyfully chasing a brightly colored butterfly in a vibrant meadow. The sunlight streams through the window, casting a warm, golden glow on the cat's fur and highlighting the soft, dappled light filtering through the leaves of nearby trees. The meadow is alive with a riot of color – deep crimson poppies, sunny yellow buttercups, and delicate lavender wildflowers sway gently in a soft breeze. The puppy's fur is a rich, golden hue, catching the light as it bounds through the tall grass, its tail wagging with unbridled enthusiasm. The background is a soft-focus blur of lush green foliage and a clear, azure sky, creating a sense of depth and tranquility. The overall mood is one of innocent joy, warmth, and the simple beauty of nature. Photorealistic, wide-angle lens, shallow depth of field.

Result:

I think the custom block could be made free of model_name constraint, no?

DN6 · 2025-08-08T16:55:16Z

I think the custom block could be made free of model_name constraint, no?

@sayakpaul You shouldn't need it. Florence block doesn't have it
https://huggingface.co/diffusers-internal-dev/florence2-image-annotator/blob/main/block.py#L16

sayakpaul · 2025-08-08T17:00:40Z

@DN6 yes that is what I am expecting. But not specifying the model_name renders the pipeline class to be a generic ModularPipeline instead of FluxModularPipeline. This is what stems the issue.

I saw in your prompt caching custom block, you have it specified. Could it be Flux specific? 👀

yiyixuxu · 2025-08-08T17:32:42Z

@sayakpaul
regarding the canny_control example

for the output, this is expected

I didn't have get_intermediates() available on the output variable

Looking though your code, I noticed, the non-modular pipeline is SDXL controlnet, the modular one you put togetheer is SDXL image-to-image controlnet - I think that's why the output are different

yiyixuxu · 2025-08-08T19:23:18Z

@DN6 @sayakpaul

Indeed, we should not need to specify model_name for custom blocks (if it is a block that can work with different pipelines). I didn't think through about this use case before so it is nice this issue comes up in the discussion!

I think currently a custom block without model_name can work on its own

e.g. if you just run the block in standalone like this it's fine

expander_block = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/gemini-prompt-expander",
    trust_remote_code=True,
)
expander = expander_block.init_pipeline()
enhanced_prompt = expander(prompt, ....)

but may have issue when it's mixed with other blocks, i.e. it can cause the assembled blocks to have model_name None, and when we convert the blocks into pipeline, it results in the ModularPipeline instead of FluxModularPipeline or StableDiffusionXLModularPipeline

e.g for this, I would expect the model_name to be stable-diffusion-xl, but it's currently None

import torch
from diffusers.modular_pipelines import SequentialPipelineBlocks
from diffusers.modular_pipelines.stable_diffusion_xl import ALL_BLOCKS

# Create modular blocks and separate text encoding and decoding steps
blocks = SequentialPipelineBlocks.from_blocks_dict(ALL_BLOCKS["text2img"])

from diffusers.modular_pipelines import ModularPipelineBlocks, InputParam, OutputParam
import torch

class TestBlock(ModularPipelineBlocks):
    
    @property
    def inputs(self):
        return [InputParam(name="x", default=0)]
        
        
    @property
    def description(self):
        return "test"
        
    def __call__(self, components, state):
        block_state = self.get_block_state(state)
        block_state.x = block_state.x + 1
        self.set_block_state(state, block_state)
        return components, state



blocks.sub_blocks.insert("test", TestBlock(), 0)
print(f"blocks.model_name: {blocks.model_name}")

to fix, for now, we can just change this

https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline.py#L1070C1-L1070C63

to return next((block.model_name for block in self.sub_blocks.values() if block.model_name is not None), None)

basically skip any None values

what do you think?

a-r-r-o-w · 2025-08-08T23:30:11Z

thanks for the cleanup! it's a massive improvement IMO! I left some comments & to-dos for future. Let's merge it soon! (I did not test for wan, not sure if it breaks anything, i think @a-r-r-o-w can help test in his PR)

Oh sorry for delay if this is pending because of me! Will test right away

sayakpaul · 2025-08-09T03:16:47Z

@yiyixuxu sounds good to me! Should also add a small comment in case we have to revisit later.

sayakpaul · 2025-08-09T03:59:04Z

@yiyixuxu your hunch was right. I just tried ControlNet with T2I and it worked like a charm with the Canny custom block.

Code

from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS
from diffusers.modular_pipelines.stable_diffusion_xl.modular_blocks import (
    StableDiffusionXLControlNetInputStep,
    StableDiffusionXLControlNetDenoiseStep
)
from diffusers.utils import load_image
from diffusers import ClassifierFreeGuidance
from diffusers.modular_pipelines import SequentialPipelineBlocks, ModularPipelineBlocks
import torch


dtype = torch.float16
my_blocks = TEXT2IMAGE_BLOCKS.copy()

control_input_block = StableDiffusionXLControlNetInputStep()
canny_filter = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/canny-filtering",
    trust_remote_code=True
)
my_blocks.insert("canny", canny_filter, 1)
my_blocks.insert("controlnet_input", control_input_block, 6)
my_blocks["denoise"] = StableDiffusionXLControlNetDenoiseStep()

i2i_pipe = SequentialPipelineBlocks.from_blocks_dict(my_blocks)
repo_id = "YiYiXu/modular-loader-t2i-0704"
pipe = i2i_pipe.init_pipeline(repo_id)
pipe.load_default_components(torch_dtype=dtype, trust_remote_code=True)
pipe.load_components(
    ["controlnet"], repo="diffusers/controlnet-canny-sdxl-1.0", torch_dtype=dtype
)

guider = ClassifierFreeGuidance(guidance_scale=5.0)
pipe.update_components(guider=guider)
pipe.to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true")
image = image.resize((1024, 1024))

output = pipe(
    prompt="red car",
    negative_prompt="bad color, unrealistic, bad formation, bad colors",
    image=image,
    num_inference_steps=28,
    generator=torch.manual_seed(0),
    controlnet_conditioning_scale=0.5,
)
output.values["images"][0].save("canny_modular.png")

I can get my red vintage car as well

:

asomoza · 2025-08-09T04:36:51Z

oh, I completely missed that, it seems it can be a common mistake since it passed both @sayakpaul and my eyes without noticing.

I have a question then, shouldn't the img2img controlnet one throw an error since it was missing the control_image which is required?

Also in the working example, shouldn't it be control_image too, why does it work with image?

sayakpaul · 2025-08-09T04:58:29Z

Because we have a custom component inside the blocks which sets control_image:
https://huggingface.co/diffusers-internal-dev/canny-filtering/blob/main/canny_block.py#L55

asomoza · 2025-08-09T05:28:40Z

Oh I see, thanks for the clarification.

update

60d1b81

DN6 requested a review from yiyixuxu July 21, 2025 17:01

update

4423097

DN6 commented Jul 22, 2025

View reviewed changes

Merge branch 'main' into custom-code-updates

201da97

DN6 changed the title ~~[Modular] More Updates for Custom Code Loading~~ [WIP Modular] More Updates for Custom Code Loading Jul 28, 2025

DN6 added 4 commits July 29, 2025 21:06

update

966a2ff

Merge branch 'custom-code-updates' of https://github.com/huggingface/…

b6dc0b7

…diffusers into custom-code-updates

update

4524d43

update

255c574

DN6 commented Jul 30, 2025

View reviewed changes

yiyixuxu mentioned this pull request Jul 31, 2025

[docs] Modular diffusers #11931

Merged

yiyixuxu reviewed Aug 4, 2025

View reviewed changes

src/diffusers/modular_pipelines/modular_pipeline.py Show resolved Hide resolved

yiyixuxu reviewed Aug 4, 2025

View reviewed changes

src/diffusers/modular_pipelines/modular_pipeline.py Show resolved Hide resolved

yiyixuxu reviewed Aug 4, 2025

View reviewed changes

src/diffusers/modular_pipelines/modular_pipeline.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Aug 4, 2025

View reviewed changes

DN6 added 4 commits August 6, 2025 17:17

update

ea77fdc

update

1b4af6b

Merge branch 'main' into custom-code-updates

ef4e373

update

9a0cc46

yiyixuxu mentioned this pull request Aug 8, 2025

[core] add modular support for Flux I2I #12086

Merged

update

d1342d7

DN6 added 3 commits August 8, 2025 19:47

update

c678e8a

Merge branch 'main' into custom-code-updates

9cda457

Merge branch 'main' into custom-code-updates

919ee1a

update

085e9cb

update

6c85fcd

DN6 added 3 commits August 9, 2025 15:06

update

512044c

update

fb8722e

Merge branch 'main' into custom-code-updates

e3f6ba3

DN6 changed the title ~~[WIP Modular] More Updates for Custom Code Loading~~ [Modular] More Updates for Custom Code Loading Aug 11, 2025

DN6 merged commit 630d27f into main Aug 11, 2025
18 checks passed

		values: Dict[str, Any] = field(default_factory=dict)
		kwargs_mapping: Dict[str, List[str]] = field(default_factory=dict)

[Modular] More Updates for Custom Code Loading #11969

[Modular] More Updates for Custom Code Loading #11969

Conversation

DN6 commented Jul 21, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2025

Uh oh!

yiyixuxu commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DN6 commented Jul 22, 2025

Uh oh!

DN6 Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Jul 22, 2025

Uh oh!

DN6 commented Jul 23, 2025

Uh oh!

yiyixuxu commented Jul 23, 2025

Uh oh!

DN6 Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 commented Jul 30, 2025

Uh oh!

yiyixuxu commented Aug 4, 2025

Uh oh!

yiyixuxu commented Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Aug 8, 2025

Uh oh!

sayakpaul commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Aug 8, 2025

Uh oh!

sayakpaul commented Aug 8, 2025

Uh oh!

asomoza commented Aug 8, 2025

Uh oh!

sayakpaul commented Aug 8, 2025

Uh oh!

sayakpaul commented Aug 8, 2025

Uh oh!

DN6 commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Aug 8, 2025

Uh oh!

yiyixuxu commented Aug 8, 2025

Uh oh!

yiyixuxu commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Aug 8, 2025

Uh oh!

sayakpaul commented Aug 9, 2025

yiyixuxu commented Jul 21, 2025 •

edited

Loading

DN6 Jul 22, 2025 •

edited

Loading

yiyixuxu commented Jul 22, 2025 •

edited

Loading

sayakpaul commented Aug 8, 2025 •

edited

Loading

sayakpaul commented Aug 8, 2025 •

edited

Loading

DN6 commented Aug 8, 2025 •

edited

Loading

yiyixuxu commented Aug 8, 2025 •

edited

Loading

sayakpaul commented Aug 9, 2025 •

edited

Loading